In [1]:
#by convention, we use these shorter two-letter names
import pysal as ps
import pandas as pd
import numpy as np

Spatial Data Processing with PySAL & Pandas

PySAL has two simple ways to read in data. But, first, you need to get the path from where your notebook is running on your computer to the place the data is. For example, to find where the notebook is running:


In [2]:
!pwd


/home/ljw/dev/ucgis_workshop_2016/notebooks

PySAL has a command that it uses to get the paths of its example datasets. Let's work with a commonly-used dataset first.


In [3]:
dbf_path = ps.examples.get_path('NAT.dbf')
print(dbf_path)


/home/ljw/.local/lib/python2.7/site-packages/pysal/examples/nat/NAT.dbf

For the purposes of this part of the workshop, we'll use the NAT.dbf example data, and the usjoin.csv data.


In [4]:
csv_path = ps.examples.get_path('usjoin.csv')

Working with shapefiles

To read in a shapefile, we will need the path to the file.


In [5]:
shp_path = ps.examples.get_path('NAT.shp')
print(shp_path)


/home/ljw/.local/lib/python2.7/site-packages/pysal/examples/nat/NAT.shp

Then, we open the file using the ps.open command:


In [6]:
f = ps.open(shp_path)

f is what we call a "file handle." That means that it only points to the data and provides ways to work with it. By itself, it does not read the whole dataset into memory. To see basic information about the file, we can use a few different methods.

For instance, the header of the file, which contains most of the metadata about the file:


In [7]:
f.header


Out[7]:
{'BBOX Mmax': 0.0,
 'BBOX Mmin': 0.0,
 'BBOX Xmax': -66.9698486328125,
 'BBOX Xmin': -124.7314224243164,
 'BBOX Ymax': 49.371734619140625,
 'BBOX Ymin': 24.95596694946289,
 'BBOX Zmax': 0.0,
 'BBOX Zmin': 3.754550197104843e+72,
 'File Code': 9994,
 'File Length': 731108,
 'Shape Type': 5,
 'Unused0': 0,
 'Unused1': 0,
 'Unused2': 0,
 'Unused3': 0,
 'Unused4': 0,
 'Version': 1000}

To actually read in the shapes from memory, you can use the following commands:


In [8]:
f.by_row(14) #gets the 14th shape from the file


Out[8]:
<pysal.cg.shapes.Polygon at 0x7f5d0d149e90>

In [9]:
all_polygons = f.read() #reads in all polygons from memory

In [10]:
len(all_polygons)


Out[10]:
3085

So, all 3085 polygons have been read in from file. These are stored in PySAL shape objects, which can be used by PySAL and can be converted to other Python shape objects. ]

They typically have a few methods. So, since we've read in polygonal data, we can get some properties about the polygons. Let's just have a look at the first polygon:


In [11]:
all_polygons[0].centroid #the centroid of the first polygon


Out[11]:
(-94.90336786329912, 48.771730563701574)

In [12]:
all_polygons[0].area


Out[12]:
0.565411079543992

In [13]:
all_polygons[0].perimeter


Out[13]:
4.055313773836516

While in the Jupyter Notebook, you can examine what properties an object has by using the tab key.


In [14]:
polygon = all_polygons[0]

In [15]:
polygon. #press tab when the cursor is right after the dot


  File "<ipython-input-15-aa03438a2fa8>", line 1
    polygon. #press tab when the cursor is right after the dot
                                                              ^
SyntaxError: invalid syntax

Working with Data Tables

When you're working with tables of data, like a csv or dbf, you can extract your data in the following way. Let's open the dbf file we got the path for above.


In [16]:
f = ps.open(dbf_path)

Just like with the shapefile, we can examine the header of the dbf file


In [17]:
f.header


Out[17]:
[u'NAME',
 u'STATE_NAME',
 u'STATE_FIPS',
 u'CNTY_FIPS',
 u'FIPS',
 u'STFIPS',
 u'COFIPS',
 u'FIPSNO',
 u'SOUTH',
 u'HR60',
 u'HR70',
 u'HR80',
 u'HR90',
 u'HC60',
 u'HC70',
 u'HC80',
 u'HC90',
 u'PO60',
 u'PO70',
 u'PO80',
 u'PO90',
 u'RD60',
 u'RD70',
 u'RD80',
 u'RD90',
 u'PS60',
 u'PS70',
 u'PS80',
 u'PS90',
 u'UE60',
 u'UE70',
 u'UE80',
 u'UE90',
 u'DV60',
 u'DV70',
 u'DV80',
 u'DV90',
 u'MA60',
 u'MA70',
 u'MA80',
 u'MA90',
 u'POL60',
 u'POL70',
 u'POL80',
 u'POL90',
 u'DNL60',
 u'DNL70',
 u'DNL80',
 u'DNL90',
 u'MFIL59',
 u'MFIL69',
 u'MFIL79',
 u'MFIL89',
 u'FP59',
 u'FP69',
 u'FP79',
 u'FP89',
 u'BLK60',
 u'BLK70',
 u'BLK80',
 u'BLK90',
 u'GI59',
 u'GI69',
 u'GI79',
 u'GI89',
 u'FH60',
 u'FH70',
 u'FH80',
 u'FH90']

So, the header is a list containing the names of all of the fields we can read. If we were interested in getting the ['NAME', 'STATE_NAME', 'HR90', 'HR80'] fields.

If we just wanted to grab the data of interest, HR90, we can use either by_col or by_col_array, depending on the format we want the resulting data in:


In [18]:
HR90 = f.by_col('HR90')
print(type(HR90).__name__, HR90[0:5])
HR90 = f.by_col_array('HR90')
print(type(HR90).__name__, HR90[0:5])


('list', [0.0, 15.885623511, 6.4624531472, 6.9965017491, 7.4780332772])
('ndarray', array([[  0.        ],
       [ 15.88562351],
       [  6.46245315],
       [  6.99650175],
       [  7.47803328]]))

As you can see, the by_col function returns a list of data, with no shape. It can only return one column at a time:


In [19]:
HRs = f.by_col('HR90', 'HR80')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-1fef6a3c3a50> in <module>()
----> 1 HRs = f.by_col('HR90', 'HR80')

TypeError: __call__() takes exactly 2 arguments (3 given)

This error message is called a "traceback," as you see in the top right, and it usually provides feedback on why the previous command did not execute correctly. Here, you see that one-too-many arguments was provided to __call__, which tells us we cannot pass as many arguments as we did to by_col.

If you want to read in many columns at once and store them to an array, use by_col_array:


In [20]:
HRs = f.by_col_array('HR90', 'HR80')

In [21]:
HRs


Out[21]:
array([[  0.        ,   8.85582713],
       [ 15.88562351,  17.20874204],
       [  6.46245315,   3.4507747 ],
       ..., 
       [  4.36732988,   5.2803488 ],
       [  3.72771194,   3.00003   ],
       [  2.04885495,   1.19474313]])

It is best to use by_col_array on data of a single type. That is, if you read in a lot of columns, some of them numbers and some of them strings, all columns will get converted to the same datatype:


In [22]:
allcolumns = f.by_col_array(['NAME', 'STATE_NAME', 'HR90', 'HR80'])

In [23]:
allcolumns


Out[23]:
array([[u'Lake of the Woods', u'Minnesota', u'0.0', u'8.8558271343'],
       [u'Ferry', u'Washington', u'15.885623511', u'17.208742041'],
       [u'Stevens', u'Washington', u'6.4624531472', u'3.4507746989'],
       ..., 
       [u'York', u'Virginia', u'4.3673298769', u'5.2803488048'],
       [u'Prince William', u'Virginia', u'3.7277119437', u'3.0000300003'],
       [u'Gallatin', u'Montana', u'2.0488549462', u'1.1947431302']], 
      dtype='<U20')

Note that the numerical columns, HR90 & HR80 are now considered strings, since they show up with the single tickmarks around them, like '0.0'.

These methods work similarly for .csv files as well

Using Pandas with PySAL

A new functionality added to PySAL recently allows you to work with shapefile/dbf pairs using Pandas. This optional extension is only turned on if you have Pandas installed. The extension is the ps.pdio module:


In [24]:
ps.pdio


Out[24]:
<module 'pysal.contrib.pdutilities' from '/home/ljw/.local/lib/python2.7/site-packages/pysal/contrib/pdutilities/__init__.pyc'>

To use it, you can read in shapefile/dbf pairs using the ps.pdio.read_files command.


In [25]:
data_table = ps.pdio.read_files(shp_path)

This reads in the entire database table and adds a column to the end, called geometry, that stores the geometries read in from the shapefile.

Now, you can work with it like a standard pandas dataframe.


In [26]:
data_table.head()


Out[26]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
0 Lake of the Woods Minnesota 27 077 27077 27 77 27077 0 0.000000 ... 0.024534 0.285235 0.372336 0.342104 0.336455 11.279621 5.4 5.663881 9.515860 <pysal.cg.shapes.Polygon object at 0x7f5d0d149...
1 Ferry Washington 53 019 53019 53 19 53019 0 0.000000 ... 0.317712 0.256158 0.360665 0.361928 0.360640 10.053476 2.6 10.079576 11.397059 <pysal.cg.shapes.Polygon object at 0x7f5d0c6af...
2 Stevens Washington 53 065 53065 53 65 53065 0 1.863863 ... 0.210030 0.283999 0.394083 0.357566 0.369942 9.258437 5.6 6.812127 10.352015 <pysal.cg.shapes.Polygon object at 0x7f5d0c6af...
3 Okanogan Washington 53 047 53047 53 47 53047 0 2.612330 ... 0.155922 0.258540 0.371218 0.381240 0.394519 9.039900 8.1 10.084926 12.840340 <pysal.cg.shapes.Polygon object at 0x7f5d0c6af...
4 Pend Oreille Washington 53 051 53051 53 51 53051 0 0.000000 ... 0.134605 0.243263 0.365614 0.358706 0.387848 8.243930 4.1 7.557643 10.313002 <pysal.cg.shapes.Polygon object at 0x7f5d0c6af...

5 rows × 70 columns

The read_files function only works on shapefile/dbf pairs. If you need to read in data using CSVs, use pandas directly:


In [27]:
usjoin = pd.read_csv(csv_path)
#usjoin = ps.pdio.read_files(usjoin) #will not work, not a shp/dbf pair

The nice thing about working with pandas dataframes is that they have very powerful baked-in support for relational-style queries. By this, I mean that it is very easy to find things like:

The number of counties in each state:


In [28]:
data_table.groupby("STATE_NAME").size()


Out[28]:
STATE_NAME
Alabama                  67
Arizona                  14
Arkansas                 75
California               58
Colorado                 63
Connecticut               8
Delaware                  3
District of Columbia      1
Florida                  67
Georgia                 159
Idaho                    44
Illinois                102
Indiana                  92
Iowa                     99
Kansas                  105
Kentucky                120
Louisiana                64
Maine                    16
Maryland                 24
Massachusetts            12
Michigan                 83
Minnesota                87
Mississippi              82
Missouri                115
Montana                  55
Nebraska                 93
Nevada                   17
New Hampshire            10
New Jersey               21
New Mexico               32
New York                 58
North Carolina          100
North Dakota             53
Ohio                     88
Oklahoma                 77
Oregon                   36
Pennsylvania             67
Rhode Island              5
South Carolina           46
South Dakota             66
Tennessee                95
Texas                   254
Utah                     29
Vermont                  14
Virginia                123
Washington               38
West Virginia            55
Wisconsin                70
Wyoming                  23
dtype: int64

Or, to get the rows of the table that are in Arizona, we can use the query function of the dataframe:


In [29]:
data_table.query('STATE_NAME == "Arizona"')


Out[29]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
1707 Navajo Arizona 04 017 04017 4 17 4017 0 5.263989 ... 0.905251 0.366863 0.414135 0.401999 0.445299 13.146998 12.1 13.762783 18.033782 <pysal.cg.shapes.Polygon object at 0x7f5d0be5c...
1708 Coconino Arizona 04 005 04005 4 5 4005 0 3.185449 ... 1.469081 0.301222 0.377785 0.381655 0.403188 9.475171 8.5 11.181563 15.267643 <pysal.cg.shapes.Polygon object at 0x7f5d0be5c...
1722 Mohave Arizona 04 015 04015 4 15 4015 0 0.000000 ... 0.324075 0.279339 0.347150 0.375790 0.374383 11.508554 4.8 7.018268 9.214294 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...
1726 Apache Arizona 04 001 04001 4 1 4001 0 10.951223 ... 0.162361 0.395913 0.450552 0.431013 0.489132 15.014738 14.6 18.727548 22.933635 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...
2002 Yavapai Arizona 04 025 04025 4 25 4025 0 3.458771 ... 0.298011 0.289509 0.378195 0.376313 0.384089 9.930032 8.6 7.516372 9.483521 <pysal.cg.shapes.Polygon object at 0x7f5d0bd62...
2182 Gila Arizona 04 007 04007 4 7 4007 0 6.473749 ... 0.246171 0.265294 0.337519 0.353848 0.386976 10.470261 8.1 9.934237 11.706102 <pysal.cg.shapes.Polygon object at 0x7f5d0bce4...
2262 Maricopa Arizona 04 013 04013 4 13 4013 0 6.179259 ... 3.499221 0.277828 0.352374 0.366015 0.372756 10.642382 9.8 11.857260 14.404902 <pysal.cg.shapes.Polygon object at 0x7f5d0bc36...
2311 Greenlee Arizona 04 011 04011 4 11 4011 0 2.896284 ... 0.349650 0.177691 0.257158 0.283518 0.337256 9.806115 6.7 5.295110 10.453284 <pysal.cg.shapes.Polygon object at 0x7f5d0bbf6...
2326 Graham Arizona 04 009 04009 4 9 4009 0 4.746648 ... 1.890487 0.310256 0.362926 0.383554 0.408379 11.979335 10.1 11.961367 16.129032 <pysal.cg.shapes.Polygon object at 0x7f5d0bbf6...
2353 Pinal Arizona 04 021 04021 4 21 4021 0 13.828390 ... 3.134586 0.304294 0.369974 0.361193 0.400130 10.822965 8.8 10.341699 15.304144 <pysal.cg.shapes.Polygon object at 0x7f5d0bc14...
2499 Pima Arizona 04 019 04019 4 19 4019 0 5.520841 ... 3.118252 0.268266 0.367218 0.375039 0.392144 11.381626 10.2 12.689768 16.163178 <pysal.cg.shapes.Polygon object at 0x7f5d0bb92...
2514 Cochise Arizona 04 003 04003 4 3 4003 0 4.845049 ... 5.201590 0.261208 0.359500 0.359701 0.399208 10.197573 8.7 9.912732 13.733872 <pysal.cg.shapes.Polygon object at 0x7f5d0bb92...
2615 Santa Cruz Arizona 04 023 04023 4 23 4023 0 9.252406 ... 0.326863 0.327130 0.396807 0.393240 0.413795 19.007213 14.7 15.690913 18.272244 <pysal.cg.shapes.Polygon object at 0x7f5d0bb60...
3080 La Paz Arizona 04 012 04012 4 12 4012 0 5.046682 ... 2.628811 0.271556 0.364110 0.372662 0.405743 9.216414 8.0 9.296093 12.379134 <pysal.cg.shapes.Polygon object at 0x7f5d0b9d8...

14 rows × 70 columns

Behind the scenes, this uses a fast vectorized library, numexpr, to essentially do the following.

First, compare each row's STATE_NAME column to 'Arizona' and return True if the row matches:


In [30]:
data_table.STATE_NAME == 'Arizona'


Out[30]:
0       False
1       False
2       False
3       False
4       False
5       False
6       False
7       False
8       False
9       False
10      False
11      False
12      False
13      False
14      False
15      False
16      False
17      False
18      False
19      False
20      False
21      False
22      False
23      False
24      False
25      False
26      False
27      False
28      False
29      False
        ...  
3055    False
3056    False
3057    False
3058    False
3059    False
3060    False
3061    False
3062    False
3063    False
3064    False
3065    False
3066    False
3067    False
3068    False
3069    False
3070    False
3071    False
3072    False
3073    False
3074    False
3075    False
3076    False
3077    False
3078    False
3079    False
3080     True
3081    False
3082    False
3083    False
3084    False
Name: STATE_NAME, dtype: bool

Then, use that to filter out rows where the condition is true:


In [31]:
data_table[data_table.STATE_NAME == 'Arizona']


Out[31]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
1707 Navajo Arizona 04 017 04017 4 17 4017 0 5.263989 ... 0.905251 0.366863 0.414135 0.401999 0.445299 13.146998 12.1 13.762783 18.033782 <pysal.cg.shapes.Polygon object at 0x7f5d0be5c...
1708 Coconino Arizona 04 005 04005 4 5 4005 0 3.185449 ... 1.469081 0.301222 0.377785 0.381655 0.403188 9.475171 8.5 11.181563 15.267643 <pysal.cg.shapes.Polygon object at 0x7f5d0be5c...
1722 Mohave Arizona 04 015 04015 4 15 4015 0 0.000000 ... 0.324075 0.279339 0.347150 0.375790 0.374383 11.508554 4.8 7.018268 9.214294 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...
1726 Apache Arizona 04 001 04001 4 1 4001 0 10.951223 ... 0.162361 0.395913 0.450552 0.431013 0.489132 15.014738 14.6 18.727548 22.933635 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...
2002 Yavapai Arizona 04 025 04025 4 25 4025 0 3.458771 ... 0.298011 0.289509 0.378195 0.376313 0.384089 9.930032 8.6 7.516372 9.483521 <pysal.cg.shapes.Polygon object at 0x7f5d0bd62...
2182 Gila Arizona 04 007 04007 4 7 4007 0 6.473749 ... 0.246171 0.265294 0.337519 0.353848 0.386976 10.470261 8.1 9.934237 11.706102 <pysal.cg.shapes.Polygon object at 0x7f5d0bce4...
2262 Maricopa Arizona 04 013 04013 4 13 4013 0 6.179259 ... 3.499221 0.277828 0.352374 0.366015 0.372756 10.642382 9.8 11.857260 14.404902 <pysal.cg.shapes.Polygon object at 0x7f5d0bc36...
2311 Greenlee Arizona 04 011 04011 4 11 4011 0 2.896284 ... 0.349650 0.177691 0.257158 0.283518 0.337256 9.806115 6.7 5.295110 10.453284 <pysal.cg.shapes.Polygon object at 0x7f5d0bbf6...
2326 Graham Arizona 04 009 04009 4 9 4009 0 4.746648 ... 1.890487 0.310256 0.362926 0.383554 0.408379 11.979335 10.1 11.961367 16.129032 <pysal.cg.shapes.Polygon object at 0x7f5d0bbf6...
2353 Pinal Arizona 04 021 04021 4 21 4021 0 13.828390 ... 3.134586 0.304294 0.369974 0.361193 0.400130 10.822965 8.8 10.341699 15.304144 <pysal.cg.shapes.Polygon object at 0x7f5d0bc14...
2499 Pima Arizona 04 019 04019 4 19 4019 0 5.520841 ... 3.118252 0.268266 0.367218 0.375039 0.392144 11.381626 10.2 12.689768 16.163178 <pysal.cg.shapes.Polygon object at 0x7f5d0bb92...
2514 Cochise Arizona 04 003 04003 4 3 4003 0 4.845049 ... 5.201590 0.261208 0.359500 0.359701 0.399208 10.197573 8.7 9.912732 13.733872 <pysal.cg.shapes.Polygon object at 0x7f5d0bb92...
2615 Santa Cruz Arizona 04 023 04023 4 23 4023 0 9.252406 ... 0.326863 0.327130 0.396807 0.393240 0.413795 19.007213 14.7 15.690913 18.272244 <pysal.cg.shapes.Polygon object at 0x7f5d0bb60...
3080 La Paz Arizona 04 012 04012 4 12 4012 0 5.046682 ... 2.628811 0.271556 0.364110 0.372662 0.405743 9.216414 8.0 9.296093 12.379134 <pysal.cg.shapes.Polygon object at 0x7f5d0b9d8...

14 rows × 70 columns

We might need this behind the scenes knowledge when we want to chain together conditions, or when we need to do spatial queries.

This is because spatial queries are somewhat more complex. Let's say, for example, we want all of the counties in the US to the West of -121 longitude. We need a way to express that question. Ideally, we want something like:

SELECT
        *
FROM
        data_table
WHERE
        x_centroid < -121

So, let's refer to an arbitrary polygon in the the dataframe's geometry column as poly. The centroid of a PySAL polygon is stored as an (X,Y) pair, so the longidtude is the first element of the pair, poly.centroid[0].

Then, applying this condition to each geometry, we get the same kind of filter we used above to grab only counties in Arizona:


In [32]:
data_table.geometry.apply(lambda poly: poly.centroid[0] < -121)


Out[32]:
0       False
1       False
2       False
3       False
4       False
5       False
6       False
7       False
8       False
9       False
10      False
11      False
12      False
13      False
14      False
15      False
16      False
17      False
18      False
19      False
20      False
21      False
22      False
23      False
24      False
25      False
26      False
27       True
28      False
29      False
        ...  
3055    False
3056    False
3057    False
3058    False
3059    False
3060    False
3061    False
3062    False
3063    False
3064    False
3065    False
3066    False
3067    False
3068    False
3069    False
3070    False
3071    False
3072    False
3073    False
3074    False
3075    False
3076    False
3077    False
3078    False
3079    False
3080    False
3081    False
3082    False
3083    False
3084    False
Name: geometry, dtype: bool

If we use this as a filter on the table, we can get only the rows that match that condition, just like we did for the STATE_NAME query:


In [33]:
data_table[data_table.geometry.apply(lambda x: x.centroid[0] < -121)]


Out[33]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
27 Whatcom Washington 53 073 53073 53 73 53073 0 1.422131 ... 0.508687 0.247630 0.346935 0.369436 0.358418 9.174415 7.1 9.718054 11.135022 <pysal.cg.shapes.Polygon object at 0x7f5d0c349...
31 Skagit Washington 53 057 53057 53 57 53057 0 2.596560 ... 0.351958 0.239346 0.344830 0.364623 0.362265 8.611518 7.9 10.480031 11.382484 <pysal.cg.shapes.Polygon object at 0x7f5d0c349...
44 Clallam Washington 53 009 53009 53 9 53009 0 3.330891 ... 0.568504 0.240573 0.349320 0.361619 0.366854 8.788882 6.5 9.660900 12.281690 <pysal.cg.shapes.Polygon object at 0x7f5d0c349...
47 Snohomish Washington 53 061 53061 53 61 53061 0 2.129319 ... 1.023748 0.234133 0.300980 0.331988 0.325067 8.244432 7.4 10.701071 12.202467 <pysal.cg.shapes.Polygon object at 0x7f5d0c349...
48 Island Washington 53 029 53029 53 29 53029 0 0.000000 ... 2.415483 0.252990 0.357682 0.369762 0.350930 8.854708 6.8 8.505577 8.509372 <pysal.cg.shapes.Polygon object at 0x7f5d0c349...
57 Jefferson Washington 53 031 53031 53 31 53031 0 0.000000 ... 0.416956 0.229855 0.326232 0.379629 0.381017 8.978723 6.0 9.088992 10.860725 <pysal.cg.shapes.Polygon object at 0x7f5d0c36a...
71 Kitsap Washington 53 035 53035 53 35 53035 0 0.791991 ... 2.691706 0.241070 0.315654 0.338109 0.347093 9.499651 8.1 9.760890 11.430652 <pysal.cg.shapes.Polygon object at 0x7f5d0c36a...
80 King Washington 53 033 53033 53 33 53033 0 3.351108 ... 5.061238 0.250251 0.317439 0.346038 0.335487 10.452660 9.8 12.742441 14.089579 <pysal.cg.shapes.Polygon object at 0x7f5d0c36a...
85 Mason Washington 53 045 53045 53 45 53045 0 0.000000 ... 0.865914 0.243041 0.342107 0.357727 0.361988 7.596109 6.1 9.091962 9.157372 <pysal.cg.shapes.Polygon object at 0x7f5d0c36a...
92 Grays Harbor Washington 53 027 53027 53 27 53027 0 1.224028 ... 0.185430 0.253515 0.338063 0.355237 0.379909 9.881310 8.7 10.305788 13.887620 <pysal.cg.shapes.Polygon object at 0x7f5d0c30c...
107 Pierce Washington 53 053 53053 53 53 53053 0 2.176685 ... 7.200577 0.251915 0.340964 0.359040 0.355849 10.097699 9.6 12.245220 15.130166 <pysal.cg.shapes.Polygon object at 0x7f5d0c30c...
116 Thurston Washington 53 067 53067 53 67 53067 0 1.211042 ... 1.776256 0.257806 0.333057 0.344955 0.343702 9.231516 8.3 11.253655 12.657732 <pysal.cg.shapes.Polygon object at 0x7f5d0c30c...
130 Pacific Washington 53 049 53049 53 49 53049 0 0.000000 ... 0.301875 0.268615 0.363945 0.379914 0.393149 8.792049 6.7 8.486192 12.150046 <pysal.cg.shapes.Polygon object at 0x7f5d0c328...
131 Lewis Washington 53 041 53041 53 41 53041 0 1.592686 ... 0.318407 0.274614 0.349660 0.366018 0.375737 9.287954 7.8 9.428291 11.757922 <pysal.cg.shapes.Polygon object at 0x7f5d0c328...
169 Skamania Washington 53 059 53059 53 59 53059 0 0.000000 ... 0.060321 0.209437 0.292057 0.332103 0.351568 6.478659 7.2 6.893424 10.092552 <pysal.cg.shapes.Polygon object at 0x7f5d0c2c5...
170 Cowlitz Washington 53 015 53015 53 15 53015 0 0.576691 ... 0.350711 0.233591 0.311116 0.340234 0.364415 8.280594 7.4 10.562645 12.664654 <pysal.cg.shapes.Polygon object at 0x7f5d0c2c5...
171 Wahkiakum Washington 53 069 53069 53 69 53069 0 0.000000 ... 0.090171 0.295670 0.331690 0.392993 0.356290 7.094595 3.7 5.722326 3.433476 <pysal.cg.shapes.Polygon object at 0x7f5d0c2c5...
181 Clatsop Oregon 41 007 41007 41 7 41007 0 3.652301 ... 0.342332 0.257003 0.346553 0.370412 0.372314 9.032872 8.8 11.249418 11.210713 <pysal.cg.shapes.Polygon object at 0x7f5d0c2c5...
183 Columbia Oregon 41 009 41009 41 9 41009 0 0.000000 ... 0.111830 0.247079 0.332061 0.331716 0.344570 7.871321 6.4 8.386444 9.064105 <pysal.cg.shapes.Polygon object at 0x7f5d0c2df...
193 Clark Washington 53 011 53011 53 11 53011 0 1.421328 ... 1.250142 0.236781 0.314911 0.334015 0.347354 8.734203 8.3 11.253167 13.134993 <pysal.cg.shapes.Polygon object at 0x7f5d0c2df...
216 Tillamook Oregon 41 057 41057 41 57 41057 0 0.000000 ... 0.180807 0.255654 0.366067 0.370520 0.375534 6.600249 6.8 8.000000 10.445818 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
217 Washington Oregon 41 067 41067 41 67 41067 0 0.722776 ... 0.660560 0.263289 0.312470 0.336796 0.332442 7.043366 6.5 10.856225 11.681524 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
224 Multnomah Oregon 41 051 41051 41 51 41051 0 2.869095 ... 6.017089 0.249353 0.341155 0.363429 0.368427 12.112316 11.7 15.648251 17.479222 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
225 Hood River Oregon 41 027 41027 41 27 41027 0 2.488491 ... 0.272141 0.236401 0.363113 0.350572 0.361331 7.021035 7.0 8.696639 9.796632 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
226 Wasco Oregon 41 065 41065 41 65 41065 0 4.949270 ... 0.308998 0.238234 0.318744 0.355666 0.371137 8.290056 7.3 9.605071 12.473225 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
240 Clackamas Oregon 41 005 41005 41 5 41005 0 2.064203 ... 0.406670 0.260309 0.328264 0.345012 0.341392 7.757834 6.8 9.549100 10.391906 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
242 Yamhill Oregon 41 071 41071 41 71 41071 0 2.052672 ... 0.564446 0.279921 0.356719 0.351616 0.360027 8.771930 7.1 9.678927 11.561807 <pysal.cg.shapes.Polygon object at 0x7f5d0c27c...
253 Marion Oregon 41 047 41047 41 47 41047 0 2.205899 ... 0.933111 0.267573 0.350250 0.359699 0.360931 9.755760 8.9 12.081406 14.314342 <pysal.cg.shapes.Polygon object at 0x7f5d0c29d...
269 Polk Oregon 41 053 41053 41 53 41053 0 2.513542 ... 0.403706 0.259774 0.367424 0.343749 0.354393 7.590948 7.1 9.383730 12.331495 <pysal.cg.shapes.Polygon object at 0x7f5d0c29d...
272 Lincoln Oregon 41 041 41041 41 41 41041 0 1.353088 ... 0.174857 0.271120 0.381057 0.381628 0.367329 7.280675 6.6 8.836895 11.156877 <pysal.cg.shapes.Polygon object at 0x7f5d0c29d...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
493 Curry Oregon 41 015 41015 41 15 41015 0 0.000000 ... 0.160397 0.217748 0.362842 0.375197 0.376336 6.471072 6.0 7.086922 8.707212 <pysal.cg.shapes.Polygon object at 0x7f5d0c1cf...
517 Josephine Oregon 41 033 41033 41 33 41033 0 0.000000 ... 0.202717 0.277720 0.380936 0.374572 0.388495 7.789162 7.4 10.601294 13.166371 <pysal.cg.shapes.Polygon object at 0x7f5d0c1e6...
594 Siskiyou California 06 093 06093 6 93 6093 0 3.040900 ... 1.587375 0.236986 0.341639 0.364010 0.391970 7.830080 6.7 8.927778 13.247093 <pysal.cg.shapes.Polygon object at 0x7f5d0c1a2...
601 Del Norte California 06 015 06015 6 15 6015 0 1.875715 ... 3.657289 0.232508 0.340163 0.375624 0.387647 6.653043 7.7 13.256139 15.687963 <pysal.cg.shapes.Polygon object at 0x7f5d0c1a2...
687 Humboldt California 06 023 06023 6 23 6023 0 2.860085 ... 0.805924 0.244980 0.350058 0.366595 0.383712 8.730068 8.5 13.044418 15.784278 <pysal.cg.shapes.Polygon object at 0x7f5d0c15b...
709 Trinity California 06 105 06105 6 105 6105 0 24.040113 ... 0.405726 0.233517 0.341123 0.359908 0.401714 5.607100 4.6 5.916813 12.469701 <pysal.cg.shapes.Polygon object at 0x7f5d0c15b...
735 Shasta California 06 089 06089 6 89 6089 0 3.923679 ... 0.735194 0.255054 0.350599 0.380390 0.386223 8.809832 9.3 11.021983 14.385732 <pysal.cg.shapes.Polygon object at 0x7f5d0c172...
907 Tehama California 06 103 06103 6 103 6103 0 1.317263 ... 0.515869 0.252703 0.351867 0.369805 0.382934 8.440284 7.5 9.523380 12.770687 <pysal.cg.shapes.Polygon object at 0x7f5d0c0f3...
974 Butte California 06 007 06007 6 7 6007 0 2.438132 ... 1.296398 0.271483 0.386386 0.386655 0.388259 9.549072 9.5 11.376733 14.197637 <pysal.cg.shapes.Polygon object at 0x7f5d0c0a0...
1005 Mendocino California 06 045 06045 6 45 6045 0 3.917037 ... 0.631029 0.253817 0.356428 0.368534 0.381629 9.223066 8.7 12.097572 15.145840 <pysal.cg.shapes.Polygon object at 0x7f5d0c039...
1047 Glenn California 06 021 06021 6 21 6021 0 1.932927 ... 0.552464 0.276480 0.359307 0.380291 0.397142 8.764201 6.4 10.187253 13.478858 <pysal.cg.shapes.Polygon object at 0x7f5d0c04e...
1090 Yuba California 06 115 06115 6 115 6115 0 6.891324 ... 4.185272 0.263611 0.371805 0.377216 0.384870 8.480176 8.8 12.686114 16.302121 <pysal.cg.shapes.Polygon object at 0x7f5d0c067...
1101 Lake California 06 033 06033 6 33 6033 0 7.253736 ... 1.842745 0.319054 0.412210 0.395366 0.389059 8.558670 7.4 9.640523 12.812349 <pysal.cg.shapes.Polygon object at 0x7f5d0c006...
1141 Colusa California 06 011 06011 6 11 6011 0 2.760524 ... 0.589862 0.261635 0.370084 0.393279 0.380581 10.166113 8.3 8.326229 12.683868 <pysal.cg.shapes.Polygon object at 0x7f5d0c01b...
1172 Sutter California 06 101 06101 6 101 6101 0 0.998602 ... 1.616083 0.268800 0.362052 0.387465 0.389836 9.551564 8.5 10.954617 13.630269 <pysal.cg.shapes.Polygon object at 0x7f5d0c033...
1262 Yolo California 06 113 06113 6 113 6113 0 4.564334 ... 2.248178 0.245169 0.360779 0.384666 0.380016 9.174198 9.5 12.683387 15.338817 <pysal.cg.shapes.Polygon object at 0x7f5d0bf7a...
1281 Napa California 06 055 06055 6 55 6055 0 1.011787 ... 1.086986 0.252514 0.338110 0.354059 0.343060 9.978769 8.4 11.310572 12.655043 <pysal.cg.shapes.Polygon object at 0x7f5d0bf91...
1282 Sonoma California 06 097 06097 6 97 6097 0 3.392706 ... 1.428822 0.286516 0.359706 0.358379 0.339574 10.679100 9.7 13.217968 13.695357 <pysal.cg.shapes.Polygon object at 0x7f5d0bf91...
1301 Sacramento California 06 067 06067 6 67 6067 0 5.370163 ... 9.328393 0.249000 0.337241 0.364755 0.369319 10.244809 11.5 16.006905 19.174287 <pysal.cg.shapes.Polygon object at 0x7f5d0bf91...
1401 Marin California 06 041 06041 6 41 6041 0 1.589248 ... 3.551561 0.257563 0.327787 0.379240 0.304507 9.385675 8.9 13.567974 13.108163 <pysal.cg.shapes.Polygon object at 0x7f5d0bf5c...
1404 San Joaquin California 06 077 06077 6 77 6077 0 4.800211 ... 5.637208 0.269695 0.357810 0.379657 0.380451 11.316301 11.0 14.402091 16.685961 <pysal.cg.shapes.Polygon object at 0x7f5d0bef8...
1447 Solano California 06 095 06095 6 95 6095 0 2.724182 ... 13.465386 0.252932 0.335622 0.337554 0.333782 9.665733 9.9 12.355744 14.541984 <pysal.cg.shapes.Polygon object at 0x7f5d0bf11...
1462 Contra Costa California 06 013 06013 6 13 6013 0 2.200328 ... 9.278839 0.252832 0.321427 0.357907 0.329502 9.251923 9.1 13.023921 14.336422 <pysal.cg.shapes.Polygon object at 0x7f5d0bf11...
1514 Alameda California 06 001 06001 6 1 6001 0 4.073952 ... 17.921531 0.263382 0.335990 0.365478 0.359518 13.515003 12.8 18.040036 19.474298 <pysal.cg.shapes.Polygon object at 0x7f5d0bec6...
1530 San Francisco California 06 075 06075 6 75 6075 0 5.718279 ... 10.917607 0.267329 0.373572 0.405414 0.386516 18.275071 16.5 20.872237 21.048826 <pysal.cg.shapes.Polygon object at 0x7f5d0bee1...
1559 San Mateo California 06 081 06081 6 81 6081 0 2.625339 ... 5.431304 0.246884 0.311345 0.353205 0.316528 9.004051 9.6 12.425667 13.585338 <pysal.cg.shapes.Polygon object at 0x7f5d0be77...
1609 Santa Clara California 06 085 06085 6 85 6085 0 1.920138 ... 3.753463 0.255751 0.314337 0.342757 0.321904 9.456542 9.4 13.471862 14.161957 <pysal.cg.shapes.Polygon object at 0x7f5d0be8e...
1664 Santa Cruz California 06 087 06087 6 87 6087 0 1.583174 ... 1.145673 0.312758 0.379682 0.388574 0.355819 11.134820 10.9 13.014953 13.947277 <pysal.cg.shapes.Polygon object at 0x7f5d0be46...
1731 San Benito California 06 069 06069 6 69 6069 0 2.165065 ... 0.555904 0.268587 0.367888 0.362120 0.356692 12.671045 9.2 9.988866 12.592351 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...
1745 Monterey California 06 053 06053 6 53 6053 0 3.697150 ... 6.424394 0.262592 0.356412 0.379990 0.368213 10.793717 10.9 12.932905 13.794950 <pysal.cg.shapes.Polygon object at 0x7f5d0bdfa...

69 rows × 70 columns

This works on any type of spatial query.

For instance, if we wanted to find all of the counties that are within a threshold distance from an observation's centroid, we can do it in the following way.

First, specify the observation. Here, we'll use Cook County, IL:


In [34]:
data_table.query('(NAME == "Cook") & (STATE_NAME == "Illinois")')


Out[34]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
3044 Cook Illinois 17 031 17031 17 31 17031 0 7.108893 ... 25.800778 0.259469 0.333628 0.380441 0.381362 14.615451 13.5 19.773258 22.473766 <pysal.cg.shapes.Polygon object at 0x7f5d0b9c7...

1 rows × 70 columns


In [35]:
geom = data_table.query('(NAME == "Cook") & (STATE_NAME == "Illinois")').geometry

In [36]:
geom.values[0].centroid


Out[36]:
(-87.82107391263027, 41.84346628270174)

In [37]:
cook_county_centroid = geom.values[0].centroid

In [44]:
import scipy.spatial.distance as d
def near_target_point(polygon, target=cook_county_centroid, threshold=1):
    return d.euclidean(polygon.centroid, target) < threshold

In [45]:
data_table[data_table.geometry.apply(near_target_point)]


Out[45]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
631 Will Illinois 17 197 17197 17 197 17197 0 2.783330 ... 10.735965 0.240381 0.289511 0.303416 0.315735 9.818182 7.5 10.259602 11.559338 <pysal.cg.shapes.Polygon object at 0x7f5d0c138...
633 Kendall Illinois 17 093 17093 17 93 17093 0 0.000000 ... 0.532819 0.257018 0.267486 0.281763 0.289925 6.638544 4.5 7.263397 8.099082 <pysal.cg.shapes.Polygon object at 0x7f5d0c138...
634 Lake Indiana 18 089 18089 18 89 18089 0 6.299491 ... 24.535213 0.234316 0.306393 0.341476 0.369924 10.490816 10.5 16.840574 20.769121 <pysal.cg.shapes.Polygon object at 0x7f5d0c138...
635 Porter Indiana 18 127 18127 18 127 18127 0 1.658953 ... 0.352124 0.236710 0.289391 0.302400 0.325637 7.351241 6.3 8.609209 10.320745 <pysal.cg.shapes.Polygon object at 0x7f5d0c138...
686 Grundy Illinois 17 063 17063 17 63 17063 0 0.000000 ... 0.064941 0.245480 0.292372 0.311057 0.316300 9.877800 5.4 6.726884 8.843309 <pysal.cg.shapes.Polygon object at 0x7f5d0c15b...
718 Kankakee Illinois 17 091 17091 17 91 17091 0 1.810355 ... 14.959223 0.236170 0.317750 0.350920 0.358303 9.825252 8.7 14.462763 16.494021 <pysal.cg.shapes.Polygon object at 0x7f5d0c172...
731 Newton Indiana 18 111 18111 18 111 18111 0 2.898047 ... 0.066416 0.301172 0.319702 0.327321 0.338931 10.296097 7.9 8.909409 8.903021 <pysal.cg.shapes.Polygon object at 0x7f5d0c172...
3010 Racine Wisconsin 55 101 55101 55 101 55101 0 2.351044 ... 9.711827 0.232056 0.304189 0.323170 0.343930 9.602611 8.7 12.519269 16.087149 <pysal.cg.shapes.Polygon object at 0x7f5d0b9b7...
3019 Kenosha Wisconsin 55 059 55059 55 59 55059 0 0.993888 ... 4.130877 0.237876 0.300109 0.319694 0.341010 9.385645 8.7 12.931595 14.615025 <pysal.cg.shapes.Polygon object at 0x7f5d0b9b7...
3028 McHenry Illinois 17 111 17111 17 111 17111 0 1.583343 ... 0.169176 0.257156 0.306415 0.315220 0.302329 8.096341 6.4 8.108716 8.182941 <pysal.cg.shapes.Polygon object at 0x7f5d0b9b7...
3030 Lake Illinois 17 097 17097 17 97 17097 0 1.135115 ... 6.733112 0.249866 0.330083 0.362992 0.317872 7.763546 7.0 9.338688 10.465916 <pysal.cg.shapes.Polygon object at 0x7f5d0b9b7...
3044 Cook Illinois 17 031 17031 17 31 17031 0 7.108893 ... 25.800778 0.259469 0.333628 0.380441 0.381362 14.615451 13.5 19.773258 22.473766 <pysal.cg.shapes.Polygon object at 0x7f5d0b9c7...
3045 Kane Illinois 17 089 17089 17 89 17089 0 0.960403 ... 5.986689 0.234401 0.290364 0.319262 0.326945 9.189593 7.8 10.994243 12.065605 <pysal.cg.shapes.Polygon object at 0x7f5d0b9c7...
3046 De Kalb Illinois 17 037 17037 17 37 17037 0 1.289142 ... 2.654879 0.260709 0.322043 0.325077 0.329307 9.609443 6.0 8.868464 10.425777 <pysal.cg.shapes.Polygon object at 0x7f5d0b9c7...
3059 Du Page Illinois 17 043 17043 17 43 17043 0 0.850723 ... 1.978083 0.241338 0.272934 0.303349 0.283952 6.010936 5.2 8.037908 9.411177 <pysal.cg.shapes.Polygon object at 0x7f5d0b9c7...

15 rows × 70 columns

Moving in and out of the dataframe

Most things in PySAL will be explicit about what type their input should be. Most of the time, PySAL functions require either lists or arrays. This is why the file-handler methods are the default IO method in PySAL: the rest of the computational tools are built around their datatypes.

However, it is very easy to get the correct datatype from Pandas using the values and tolist commands.

tolist() will convert its entries to a list. But, it can only be called on individual columns (called Series in pandas documentation)

So, to turn the NAME column into a list:


In [46]:
data_table.NAME.tolist()


Out[46]:
[u'Lake of the Woods',
 u'Ferry',
 u'Stevens',
 u'Okanogan',
 u'Pend Oreille',
 u'Boundary',
 u'Lincoln',
 u'Flathead',
 u'Glacier',
 u'Toole',
 u'Liberty',
 u'Hill',
 u'Sheridan',
 u'Divide',
 u'Burke',
 u'Renville',
 u'Bottineau',
 u'Rolette',
 u'Towner',
 u'Cavalier',
 u'Pembina',
 u'Kittson',
 u'Roseau',
 u'Blaine',
 u'Phillips',
 u'Valley',
 u'Daniels',
 u'Whatcom',
 u'Bonner',
 u'Ward',
 u'Koochiching',
 u'Skagit',
 u'Williams',
 u'McHenry',
 u'St. Louis',
 u'Roosevelt',
 u'Mountrial',
 u'Marshall',
 u'Ramsey',
 u'Walsh',
 u'Beltrami',
 u'Pierce',
 u'Chelan',
 u'Pondera',
 u'Clallam',
 u'Benson',
 u'Chouteau',
 u'Snohomish',
 u'Island',
 u'Sanders',
 u'Lake',
 u'Nelson',
 u'Grand Forks',
 u'Polk',
 u'Pennington',
 u'Douglas',
 u'McKenzie',
 u'Jefferson',
 u'Richland',
 u'Teton',
 u'McCone',
 u'Shoshone',
 u'Spokane',
 u'Lake',
 u'Clearwater',
 u'Kootenai',
 u'Garfield',
 u'Red Lake',
 u'Grant',
 u'Lincoln',
 u'Lewis and Clark',
 u'Kitsap',
 u'Itasca',
 u'Sheridan',
 u'Wells',
 u'McLean',
 u'Eddy',
 u'Dunn',
 u'Fergus',
 u'Dawson',
 u'King',
 u'Cascade',
 u'Griggs',
 u'Steele',
 u'Traill',
 u'Mason',
 u'Missoula',
 u'Petroleum',
 u'Powell',
 u'Kittitas',
 u'Foster',
 u'Mercer',
 u'Grays Harbor',
 u'Norman',
 u'Mahnomen',
 u'Mineral',
 u'Cass',
 u'Aroostook',
 u'Judith Basin',
 u'Hubbard',
 u'Benewah',
 u'Wibaux',
 u'Golden Valley',
 u'Billings',
 u'Stutsman',
 u'Kidder',
 u'Burleigh',
 u'Pierce',
 u'Oliver',
 u'Adams',
 u'Whitman',
 u'Barnes',
 u'Cass',
 u'Prairie',
 u'Becker',
 u'Clay',
 u'Thurston',
 u'Latah',
 u'Meagher',
 u'Yakima',
 u'Aitkin',
 u'Stark',
 u'Morton',
 u'Bayfield',
 u'Clearwater',
 u'Custer',
 u'Rosebud',
 u'Granite',
 u'Wadena',
 u'Crow Wing',
 u'Pacific',
 u'Lewis',
 u'Broadwater',
 u'Carlton',
 u'Golden Valley',
 u'Douglas',
 u'Musselshell',
 u'Wheatland',
 u'Franklin',
 u'Benton',
 u'Grant',
 u'Otter Tail',
 u'Garfield',
 u'Fallon',
 u'Idaho',
 u'Ravalli',
 u'Ashland',
 u'Logan',
 u'Emmons',
 u'La Moure',
 u'Slope',
 u'Hettinger',
 u'Ransom',
 u'Richland',
 u'Wilkin',
 u'Nez Perce',
 u'Columbia',
 u'Walla Walla',
 u'Iron',
 u'Somerset',
 u'Piscataquis',
 u'Jefferson',
 u'Yellowstone',
 u'Treasure',
 u'Lewis',
 u'Asotin',
 u'Sioux',
 u'Pine',
 u'Penobscot',
 u'Skamania',
 u'Cowlitz',
 u'Wahkiakum',
 u'Todd',
 u'Morrison',
 u'McIntosh',
 u'Dickey',
 u'Sargent',
 u'Bowman',
 u'Adams',
 u'Deer Lodge',
 u'Mille Lacs',
 u'Clatsop',
 u'Sweet Grass',
 u'Columbia',
 u'Silver Bow',
 u'Washburn',
 u'Sawyer',
 u'Burnett',
 u'Kanabec',
 u'Carter',
 u'Stillwater',
 u'Grant',
 u'Douglas',
 u'Clark',
 u'Klickitat',
 u'Big Horn',
 u'Traverse',
 u'Umatilla',
 u'Wallowa',
 u'Price',
 u'Campbell',
 u'Harding',
 u'McPherson',
 u'Perkins',
 u'Corson',
 u'Brown',
 u'Beaverhead',
 u'Marshall',
 u'Roberts',
 u'Morrow',
 u'Union',
 u'Madison',
 u'Benton',
 u'Gilliam',
 u'Powder River',
 u'Stearns',
 u'Tillamook',
 u'Washington',
 u'Pope',
 u'Stevens',
 u'Isanti',
 u'Chisago',
 u'Sherman',
 u'Polk',
 u'Multnomah',
 u'Hood River',
 u'Wasco',
 u'Lemhi',
 u'Washington',
 u'Franklin',
 u'Barron',
 u'Rusk',
 u'Carbon',
 u'Walworth',
 u'Edmunds',
 u'Day',
 u'Big Stone',
 u'Sherburne',
 u'Ziebach',
 u'Dewey',
 u'Clackamas',
 u'Wright',
 u'Yamhill',
 u'Anoka',
 u'Kandiyohi',
 u'Swift',
 u'Taylor',
 u'Oxford',
 u'Grant',
 u'Meeker',
 u'Coos',
 u'Washington',
 u'Chippewa',
 u'Marion',
 u'Lac Qui Parle',
 u'Adams',
 u'Potter',
 u'Faulk',
 u'Hennepin',
 u'Spink',
 u'St. Croix',
 u'Dunn',
 u'Butte',
 u'Valley',
 u'Clark',
 u'Codington',
 u'Chippewa',
 u'Ramsey',
 u'Baker',
 u'Polk',
 u'Wheeler',
 u'Meade',
 u'Lincoln',
 u'Clark',
 u'Essex',
 u'Grand Isle',
 u'Franklin',
 u'Orleans',
 u'Clinton',
 u'Park',
 u'Crook',
 u'Big Horn',
 u'Campbell',
 u'Sheridan',
 u'Franklin',
 u'Grant',
 u'McLeod',
 u'Carver',
 u'Deuel',
 u'Dakota',
 u'Yellow Medicine',
 u'Sully',
 u'Hyde',
 u'Hand',
 u'Renville',
 u'Pierce',
 u'Custer',
 u'Eau Claire',
 u'Washington',
 u'Jefferson',
 u'Scott',
 u'Hamlin',
 u'Lamoille',
 u'Linn',
 u'Stanley',
 u'Caledonia',
 u'Waldo',
 u'Haakon',
 u'Fremont',
 u'Sibley',
 u'Chittenden',
 u'Kennebec',
 u'Goodhue',
 u'Benton',
 u'Redwood',
 u'Wood',
 u'Pepin',
 u'Teton',
 u'Beadle',
 u'Lyon',
 u'Lincoln',
 u'Lawrence',
 u'Buffalo',
 u'Trempealeau',
 u'Jackson',
 u'Clark',
 u'Crook',
 u'Johnson',
 u'Hughes',
 u'Essex',
 u'Le Sueur',
 u'Rice',
 u'Kingsbury',
 u'Brookings',
 u'Pennington',
 u'Gem',
 u'Brown',
 u'Washington',
 u'Androscoggin',
 u'Nicollet',
 u'Wabasha',
 u'Malheur',
 u'Hancock',
 u'Grafton',
 u'Deschutes',
 u'Knox',
 u'Boise',
 u'Lincoln',
 u'Addison',
 u'Carroll',
 u'Lane',
 u'Blue Earth',
 u'Juneau',
 u'Butte',
 u'Lyman',
 u'Orange',
 u'Waseca',
 u'Buffalo',
 u'Jerauld',
 u'Moody',
 u'Pipestone',
 u'Dodge',
 u'Sanborn',
 u'Murray',
 u'Cottonwood',
 u'Steele',
 u'Olmsted',
 u'Miner',
 u'Lake',
 u'Winona',
 u'Weston',
 u'Jones',
 u'Cumberland',
 u'Washakie',
 u'Monroe',
 u'Payette',
 u'Hamilton',
 u'Watonwan',
 u'Herkimer',
 u'La Crosse',
 u'Elmore',
 u'Hot Springs',
 u'Jefferson',
 u'Harney',
 u'Sagadahoc',
 u'Fremont',
 u'Jackson',
 u'Blaine',
 u'Teton',
 u'Windsor',
 u'Douglas',
 u'Aurora',
 u'Brule',
 u'Madison',
 u'Canyon',
 u'Mellette',
 u'Camas',
 u'Custer',
 u'Rutland',
 u'Faribault',
 u'Minnehaha',
 u'Rock',
 u'Freeborn',
 u'Nobles',
 u'Jackson',
 u'Martin',
 u'Houston',
 u'Mower',
 u'Fillmore',
 u'Davison',
 u'Hanson',
 u'McCook',
 u'York',
 u'Washington',
 u'Ada',
 u'Warren',
 u'Tripp',
 u'Belknap',
 u'Vernon',
 u'Shannon',
 u'Owyhee',
 u'Bonneville',
 u'Bingham',
 u'Klamath',
 u'Lake',
 u'Coos',
 u'Merrimack',
 u'Sullivan',
 u'Strafford',
 u'Richland',
 u'Natrona',
 u'Gregory',
 u'Niobrara',
 u'Charles Mix',
 u'Turner',
 u'Lincoln',
 u'Worth',
 u'Mitchell',
 u'Allamakee',
 u'Winnebago',
 u'Winneshiek',
 u'Converse',
 u'Osceola',
 u'Dickinson',
 u'Kossuth',
 u'Howard',
 u'Emmet',
 u'Lyon',
 u'Douglas',
 u'Hutchinson',
 u'Fall River',
 u'Sublette',
 u'Crawford',
 u'Saratoga',
 u'Bennett',
 u'Todd',
 u'Bennington',
 u'Lincoln',
 u'Fulton',
 u'Rockingham',
 u'Sioux',
 u'Windham',
 u"O'Brien",
 u'Cerro Gordo',
 u'Clay',
 u'Hancock',
 u'Palo Alto',
 u'Floyd',
 u'Chickasaw',
 u'Iowa',
 u'Grant',
 u'Hillsborough',
 u'Lincoln',
 u'Gooding',
 u'Minidoka',
 u'Cheshire',
 u'Yankton',
 u'Bon Homme',
 u'Power',
 u'Union',
 u'Clay',
 u'Fayette',
 u'Clayton',
 u'Montgomery',
 u'Caribou',
 u'Bannock',
 u'Sioux',
 u'Dawes',
 u'Sheridan',
 u'Jackson',
 u'Keya Paha',
 u'Boyd',
 u'Cherry',
 u'Curry',
 u'Rensselaer',
 u'Schenectady',
 u'Plymouth',
 u'Cherokee',
 u'Bremer',
 u'Butler',
 u'Buena Vista',
 u'Twin Falls',
 u'Pocahontas',
 u'Humboldt',
 u'Wright',
 u'Franklin',
 u'Otsego',
 u'Holt',
 u'Essex',
 u'Knox',
 u'Cedar',
 u'Jerome',
 u'Schoharie',
 u'Brown',
 u'Lafayette',
 u'Albany',
 u'Rock',
 u'Josephine',
 u'Dixon',
 u'Berkshire',
 u'Franklin',
 u'Middlesex',
 u'Worcester',
 u'Dubuque',
 u'Cassia',
 u'Webster',
 u'Delaware',
 u'Buchanan',
 u'Black Hawk',
 u'Goshen',
 u'Platte',
 u'Bear Lake',
 u'Woodbury',
 u'Ida',
 u'Sac',
 u'Calhoun',
 u'Hamilton',
 u'Hampshire',
 u'Hardin',
 u'Grundy',
 u'Dakota',
 u'Delaware',
 u'Jo Daviess',
 u'Columbia',
 u'Oneida',
 u'Greene',
 u'Suffolk',
 u'Carbon',
 u'Pierce',
 u'Box Butte',
 u'Antelope',
 u'Albany',
 u'Franklin',
 u'Jackson',
 u'Wayne',
 u'Hampden',
 u'Jones',
 u'Benton',
 u'Linn',
 u'Tama',
 u'Thurston',
 u'Sweetwater',
 u'Plymouth',
 u'Norfolk',
 u'Monona',
 u'Crawford',
 u'Carroll',
 u'Greene',
 u'Boone',
 u'Marshall',
 u'Story',
 u'Ulster',
 u'Cuming',
 u'Bristol',
 u'Stanton',
 u'Madison',
 u'Grant',
 u'Loup',
 u'Hooker',
 u'Garfield',
 u'Thomas',
 u'Wheeler',
 u'Blaine',
 u'Dutchess',
 u'Barnstable',
 u'Burt',
 u'Litchfield',
 u'Hartford',
 u'Clinton',
 u'Tolland',
 u'Windham',
 u'Sullivan',
 u'Providence',
 u'Cache',
 u'Siskiyou',
 u'Garden',
 u'Box Elder',
 u'Rich',
 u'Scotts Bluff',
 u'Morrill',
 u'Wayne',
 u'Del Norte',
 u'Humboldt',
 u'Elko',
 u'Modoc',
 u'Washoe',
 u'Cedar',
 u'Boone',
 u'Jasper',
 u'Polk',
 u'Poweshiek',
 u'Harrison',
 u'Guthrie',
 u'Shelby',
 u'Audubon',
 u'Iowa',
 u'Dallas',
 u'Johnson',
 u'Rock Island',
 u'Scott',
 u'Bristol',
 u'Kent',
 u'Colfax',
 u'Dodge',
 u'Platte',
 u'Arthur',
 u'Greeley',
 u'McPherson',
 u'Logan',
 u'Custer',
 u'Valley',
 u'Will',
 u'Lucas',
 u'Kendall',
 u'Lake',
 u'Porter',
 u'Fulton',
 u'Geauga',
 u'New London',
 u'Williams',
 u'Banner',
 u'Washington',
 u'Newport',
 u'Fairfield',
 u'Laramie',
 u'Wyoming',
 u'Washington',
 u'Middlesex',
 u'New Haven',
 u'Lackawanna',
 u'La Salle',
 u'Orange',
 u'Elk',
 u'Cuyahoga',
 u'Venango',
 u'Forest',
 u'Ottawa',
 u'Cameron',
 u'Wood',
 u'Pike',
 u'Lycoming',
 u'Muscatine',
 u'Sullivan',
 u'Bureau',
 u'Henry',
 u'Uinta',
 u'Noble',
 u'De Kalb',
 u'Putnam',
 u'Nance',
 u'Lorain',
 u'Mahaska',
 u'Pottawattamie',
 u'Washington',
 u'Marion',
 u'Madison',
 u'Warren',
 u'Keokuk',
 u'Cass',
 u'Adair',
 u'Sandusky',
 u'Trumbull',
 u'Mercer',
 u'Marshall',
 u'Henry',
 u'Clinton',
 u'Grundy',
 u'Humboldt',
 u'Butler',
 u'Saunders',
 u'Erie',
 u'Kosciusko',
 u'Starke',
 u'Clarion',
 u'Cheyenne',
 u'Defiance',
 u'Weber',
 u'Louisa',
 u'Luzerne',
 u'Polk',
 u'Douglas',
 u'Sherman',
 u'Howard',
 u'Lincoln',
 u'Merrick',
 u'Keith',
 u'Kimball',
 u'Jefferson',
 u'Morgan',
 u'Trinity',
 u'Westchester',
 u'Sussex',
 u'Summit',
 u'Portage',
 u'Mercer',
 u'Rockland',
 u'Putnam',
 u'Columbia',
 u'Kankakee',
 u'Whitley',
 u'Jasper',
 u'Huron',
 u'Allen',
 u'Medina',
 u'Summit',
 u'Centre',
 u'Clearfield',
 u'Monroe',
 u'Seneca',
 u'Paulding',
 u'Stark',
 u'Newton',
 u'Deuel',
 u'Passaic',
 u'Sarpy',
 u'Shasta',
 u'Lassen',
 u'Northumberland',
 u'Fulton',
 u'Butler',
 u'Armstrong',
 u'Pulaski',
 u'Montour',
 u'Hancock',
 u'Mills',
 u'Putnam',
 u'Montgomery',
 u'Adams',
 u'Clarke',
 u'Wapello',
 u'Jefferson',
 u'Union',
 u'Henry',
 u'Hamilton',
 u'Lucas',
 u'Monroe',
 u'Davis',
 u'Knox',
 u'Marshall',
 u'Union',
 u'Carbon',
 u'Mahoning',
 u'Lawrence',
 u'Bergen',
 u'Livingston',
 u'Warren',
 u'Tooele',
 u'Morris',
 u'Des Moines',
 u'Henderson',
 u'Warren',
 u'Cass',
 u'Ashland',
 u'Wabash',
 u'Seward',
 u'Lancaster',
 u'Buffalo',
 u'Dawson',
 u'York',
 u'Hall',
 u'Huntington',
 u'Iroquois',
 u'Miami',
 u'Ford',
 u'Moffat',
 u'Weld',
 u'Jackson',
 u'Perkins',
 u'Logan',
 u'Sedgwick',
 u'Routt',
 u'Larimer',
 u'Daggett',
 u'Crawford',
 u'Lander',
 u'Eureka',
 u'Richland',
 u'Wayne',
 u'Van Wert',
 u'Wyandot',
 u'Stark',
 u'Peoria',
 u'Northampton',
 u'Pershing',
 u'Schuylkill',
 u'Adams',
 u'Wells',
 u'Woodford',
 u'Columbiana',
 u'Cass',
 u'White',
 u'Salt Lake',
 u'Allen',
 u'Indiana',
 u'Fremont',
 u'Page',
 u'Taylor',
 u'Ringgold',
 u'Nassau',
 u'Davis',
 u'Van Buren',
 u'Decatur',
 u'Wayne',
 u'Essex',
 u'Appanoose',
 u'Snyder',
 u'Uintah',
 u'Beaver',
 u'Mifflin',
 u'Duchesne',
 u'Hudson',
 u'Lee',
 u'Hardin',
 u'Otoe',
 u'Lehigh',
 u'Hunterdon',
 u'Suffolk',
 u'McLean',
 u'Somerset',
 u'Carroll',
 u'Tazewell',
 u'Blair',
 u'Benton',
 u'Phillips',
 u'Huntingdon',
 u'Cambria',
 u'Union',
 u'Mercer',
 u'Carroll',
 u'Fulton',
 u'Morrow',
 u'Marion',
 u'Saline',
 u'Adams',
 u'Clay',
 u'Fillmore',
 u'Juniata',
 u'Hayes',
 u'Chase',
 u'Frontier',
 u'Gosper',
 u'Auglaize',
 u'Kearney',
 u'Wasatch',
 u'Phelps',
 u'Berks',
 u'Westmoreland',
 u'Allegheny',
 u'Grant',
 u'Holmes',
 u'Dauphin',
 u'Tuscarawas',
 u'Hancock',
 u'McDonough',
 u'Perry',
 u'Hancock',
 u'Bucks',
 u'Clark',
 u'Scotland',
 u'Schuyler',
 u'Middlesex',
 u'Jefferson',
 u'Blackford',
 u'Putnam',
 u'Atchison',
 u'Jay',
 u'Utah',
 u'Nodaway',
 u'Mercer',
 u'Howard',
 u'Harrison',
 u'Tippecanoe',
 u'Worth',
 u'Knox',
 u'Nemaha',
 u'Lebanon',
 u'Logan',
 u'Gage',
 u'Johnson',
 u'Morgan',
 u'Union',
 u'Vermilion',
 u'Warren',
 u'Shelby',
 u'Washington',
 u'Grand',
 u'Coshocton',
 u'Tehama',
 u'Monmouth',
 u'Plumas',
 u'Delaware',
 u'Montgomery',
 u'Mason',
 u'Clinton',
 u'Yuma',
 u'Washington',
 u'Harrison',
 u'Mercer',
 u'Tipton',
 u'Champaign',
 u'Brooke',
 u'Madison',
 u'Delaware',
 u'Gentry',
 u'Sullivan',
 u'Fountain',
 u'Darke',
 u'Thayer',
 u'Jefferson',
 u'Adair',
 u'Dundy',
 u'Franklin',
 u'Webster',
 u'Nuckolls',
 u'Hitchcock',
 u'Harlan',
 u'Furnas',
 u'Red Willow',
 u'Logan',
 u'Bedford',
 u'Cumberland',
 u'Randolph',
 u'Lancaster',
 u'Knox',
 u'Franklin',
 u'Piatt',
 u'De Witt',
 u'Somerset',
 u'Schuyler',
 u'Licking',
 u'Champaign',
 u'Holt',
 u'Richardson',
 u'Pawnee',
 u'Grundy',
 u'Boulder',
 u'Lewis',
 u'Chester',
 u'Hamilton',
 u'York',
 u'Montgomery',
 u'Rio Blanco',
 u'Guernsey',
 u'Adams',
 u'Miami',
 u'Ohio',
 u'Boone',
 u'Burlington',
 u'Vermillion',
 u'Menard',
 u'Belmont',
 u'Ocean',
 u'Fulton',
 u'Muskingum',
 u'Butte',
 u'Daviess',
 u'Franklin',
 u'Fayette',
 u'Philadelphia',
 u'Andrew',
 u'Cass',
 u'White Pine',
 u'Madison',
 u'Brown',
 u'Garfield',
 u'Henry',
 u'Adams',
 u'Delaware',
 u'Macon',
 u'De Kalb',
 u'Macon',
 u'Clark',
 u'Linn',
 u'Marshall',
 u'Greene',
 u'Wayne',
 u'Juab',
 u'Churchill',
 u'Norton',
 u'Phillips',
 ...]

To extract many columns, you must select the columns you want and call their .values attribute.

If we were interested in grabbing all of the HR variables in the dataframe, we could first select those column names:


In [47]:
HRs = [col for col in data_table.columns if col.startswith('HR')]

We can use this to focus only on the columns we want:


In [48]:
data_table[HRs]


Out[48]:
HR60 HR70 HR80 HR90
0 0.000000 0.000000 8.855827 0.000000
1 0.000000 0.000000 17.208742 15.885624
2 1.863863 1.915158 3.450775 6.462453
3 2.612330 1.288643 3.263814 6.996502
4 0.000000 0.000000 7.770008 7.478033
5 0.000000 0.000000 4.573101 4.000640
6 7.976390 5.536179 5.633168 5.720497
7 1.011173 1.689475 4.490115 2.814460
8 11.529039 9.273857 28.227324 5.500096
9 0.000000 5.708740 0.000000 6.605892
10 0.000000 0.000000 0.000000 0.000000
11 3.574045 3.840688 7.413585 1.888146
12 0.000000 0.000000 0.000000 0.000000
13 0.000000 0.000000 0.000000 0.000000
14 0.000000 0.000000 0.000000 0.000000
15 0.000000 0.000000 0.000000 0.000000
16 2.945942 0.000000 0.000000 0.000000
17 0.000000 0.000000 19.161808 5.219752
18 0.000000 0.000000 0.000000 0.000000
19 0.000000 8.117213 0.000000 0.000000
20 0.000000 0.000000 0.000000 0.000000
21 0.000000 4.864050 0.000000 0.000000
22 0.000000 0.000000 0.000000 2.218377
23 4.119804 9.910312 14.287755 14.863258
24 5.530668 0.000000 12.421589 0.000000
25 5.854801 5.811757 6.504065 4.045798
26 0.000000 10.811980 0.000000 0.000000
27 1.422131 2.439530 4.061193 2.086920
28 2.138534 2.142245 5.518079 3.756292
29 0.708135 1.138434 0.570854 1.150993
... ... ... ... ...
3055 0.000000 0.000000 2.107526 0.739919
3056 0.000000 0.970572 4.400324 0.825491
3057 0.611430 2.568301 1.974919 2.828994
3058 2.022286 2.033140 0.000000 3.987956
3059 0.850723 1.981012 2.782690 2.260130
3060 1.790825 3.732470 5.757329 4.007173
3061 0.556604 1.060271 1.010560 2.769193
3062 0.000000 1.756836 5.505395 2.907653
3063 0.427592 3.688132 2.625587 0.000000
3064 0.896660 2.704530 4.229303 2.938915
3065 1.051403 5.379304 7.057466 3.424679
3066 2.095434 6.671377 9.243279 7.285916
3067 1.872835 3.951663 5.339935 3.201065
3068 1.917913 6.382639 1.304631 0.000000
3069 1.939789 4.960564 0.000000 6.072530
3070 5.452765 15.156192 24.841012 28.268787
3071 7.520089 9.163383 10.590286 6.443839
3072 7.202448 9.746302 11.850014 12.561604
3073 8.253379 15.655752 21.173432 16.479507
3074 2.181802 3.074760 3.191133 3.300700
3075 4.902862 11.782264 7.680787 18.362582
3076 18.513376 17.133324 15.034136 12.027015
3077 4.159907 4.126434 3.967782 6.585273
3078 5.403098 5.970974 4.127839 2.586787
3079 1.121183 1.096311 2.442074 2.806112
3080 5.046682 13.152054 13.251761 5.521552
3081 3.411368 7.393533 11.453817 8.691999
3082 1.544425 6.023552 5.280349 4.367330
3083 9.302820 1.800148 3.000030 3.727712
3084 3.396162 2.284879 1.194743 2.048855

3085 rows × 4 columns

With this, calling .values gives an array containing all of the entries in this subset of the table:


In [49]:
data_table[HRs].values


Out[49]:
array([[  0.        ,   0.        ,   8.85582713,   0.        ],
       [  0.        ,   0.        ,  17.20874204,  15.88562351],
       [  1.86386342,   1.91515848,   3.4507747 ,   6.46245315],
       ..., 
       [  1.5444254 ,   6.02355209,   5.2803488 ,   4.36732988],
       [  9.30282008,   1.80014761,   3.00003   ,   3.72771194],
       [  3.39616234,   2.28487867,   1.19474313,   2.04885495]])

Using the PySAL pdio tools means that if you're comfortable with working in Pandas, you can continue to do so.

If you're more comfortable using Numpy or raw Python to do your data processing, PySAL's IO tools naturally support this.


In [ ]: